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Transcriptomics has the potential to discover new RNA virus genomes by sequencing total intracellular RNA 
pools. In this study, we have searched publicly available transcriptomes for sequences similar to viruses of the 
Nidovirales order. We report two potential nidovirus genomes, a highly divergent 35.9 kb likely complete genome 
from the California sea hare Aplysia califomica, which we assign to a nidovirus named Aplysia abyssovirus 1 
(AAbV), and a coronavirus-like 22.3 kb partial genome from the ornamented pygmy frog Microhyla fissipes, 
which we assign to a nidovirus named Microhyla alphaletovirus 1 (MLeV). AAbV was shown to encode a 
functional main proteinase, and a translational readthrough signal. Phylogenetic analysis suggested that AAbV 
represents a new family, proposed here as Abyssoviridae. MLeV represents a sister group to the other known 
coronaviruses. The importance of MLeV and AAbV for understanding nidovirus evolution, and the origin of 
terrestrial nidoviruses are discussed. 


1. Introduction 

Until recently, discovery of new RNA viruses proceeded slowly in a 
mostly hypothesis-driven manner while searching for an agent of a 
disease, and using antibody cross-reactivity or enough conserved motifs 
for successful amplification by reverse transcriptase polymerase chain 
reaction. With improvements in RNA transcriptome sequencing and 
homology-based search methods, it is now possible to capture the 
complete infecting RNA virome of an organism by deep-sequencing 
total intracellular RNA pools (Miranda et al., 2016; Shi et al, 2018, 
2016). 

The new sequencing methods have brought a great change to the 
Nidovirales, an order that includes viruses with complex replicase 
polyproteins and the largest known RNA genomes (Lauber et al., 2013). 
This order previously contained four family-level groups, the Cor- 
onaviridae which infect birds and mammals including humans, the Ar- 
teriviridae which infect non-human mammals, the Mesoniviridae which 
infect arthropods, and the Roniviridae which infect crustaceans (Lauber 
et al., 2013). However, recent papers (Lauck et al., 2015; O’Dea et al., 


2016; Saberi et al., 2018; Shi et al., 2018, 2016; Tokarz et al, 2015; 
Vasilakis et al., 2014; Wahl-Jensen et al., 2016) and our results (see 
below) have added to within-family diversity and revealed several 
highly divergent nido-like viruses which the Nidovirales Study Group 
proposed, pending ICTV ratification, to form four new virus families 
within the Nidovirales (Gorbalenya et al., 2017a). 

In this report we describe the discovery and characterization of one 
of the nidoviruses prototyping a new family along with another puta¬ 
tive nidovirus. We used BLAST searches to scan the publicly available 
transcriptomes and expressed sequence tag libraries available at the US 
National Center for Biotechnology Information, and revealed two novel 
nido-like virus sequences from the frog Microhyla fissipes developmental 
transcriptome (Zhao et al., 2016) and from several transcriptome stu¬ 
dies dealing with the marine gastropod Aplysia califomica ( Fiedler et al., 
2010; Heyland et al., 2011; Moroz et al., 2006). We describe the 
bioinformatics of the new virus-like sequences, and demonstrate that 
the Aplysia virus-like sequence encodes a functional proteinase, and a 
translational termination-suppression signal. Implications for nidovirus 
evolution and the origin of nidovirus structural proteins are discussed. 
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2. Results 

2.1. Virus discovery 

Recent studies have identified a wide variety of virus-like sequences 
in intracellular RNA pools, but few new members of the Nidovirales 
have been reported compared to groups such as the Picomavirales. In 
order to determine whether additional lineages of nido-like viruses 
might be present, tBLASTn (Altschul et ah, 1990) was used to search the 
transcriptome shotgun assembly (TSA) and expressed sequence tag 
(EST) databases for sequences encoding proteins similar to the main 
proteinase (M pro ), polymerase and helicase, or complete pplb regions 
of the nidovirus strains Infectious bronchitis virus, Gill-associated virus, 
White bream virus, Cavally virus and Wobbly possum disease virus. The 
tBLASTn results were checked by using BLASTx to compare each result 
to the non-redundant protein database, and results that matched back 
to any member of the Nidovirales were selected for further analysis. 
This led to the discovery of a 35.9 kb transcript and 243 other fragments 
from the California sea hare, Aplysia califomica, and a 22.3 kb transcript 
from Microhyla fissipes, known as the ornamented pygmy frog. Putative 
virus transcripts were then compared to DNA sequences from the same 
organisms by nucleotide BLAST, and no evidence of either virus was 
found. Together, these tests suggest that both nidovirus-like transcripts 
most likely come from RNA viruses associated with host transcriptomes. 

2.2. Phylogenetic analysis 

Phylogenetic analysis was performed by IQ Tree 1.5.5 (Nguyen 
et al., 20 IT ) using five protein domains universally conserved in known 
and proposed nidoviruses plus the virus-like sequences described in this 
study (see below). The produced maximum-likelihood tree was mid¬ 
point rooted to reveal two strongly-supported super-clades, consisting 
of four strongly-supported major clades corresponding to arteri-like 
viruses, toro-like viruses, corona-like viruses, and invertebrate nido¬ 
viruses ( 7 ig. 1). A Bayesian rooted tree (not shown) was also con¬ 
structed using the same viral sequences, and it yielded the same four 
major clades, but with weaker support values on some branches and a 
basal position of the arteri-like major clade. Together these results 
suggest that the novel virus-like sequences likely represent distantly 
related members of the Nidovirales, but the tree branch uncertainty also 
demonstrates the limitations of these phylogenetic approaches in 
dealing with the extreme diversity of the sparsely sampled nido-like 
viruses. 

The virus-like sequence from Aplysia califomica formed a relatively 
long and moderately supported branch that clustered with other in¬ 
vertebrate nidoviruses, forming a sister group to a clade consisting of 
the Mesoniviridae and a recently discovered nidovirus from the marine 
snail Turritella, TurrNV. The virus-like sequence from Microhyla fissipes 
clustered with strong support as a sister group to the known 
Coronavirinae. We named these putative viruses Aplysia abyssovirus 
(AAbV) and Microhyla letovirus (MLeV), respectively. 

While we were expressing viral proteins to biologically validate the 
new sequences and preparing this manuscript, a second manuscript 
appeared on BioRxiv (Debat, 2018) from Humberto Debat who was 
describing the same Aplysia virus from the same source material, posted 
April 24th, 2018, where it is called Aplysia califomica nido-like virus. 
That report covers the tissue tropism and age-dependent prevalence of 
the Aplysia virus thoroughly, so in this manuscript we will focus on 
bioinformatics analysis and biological validation of this virus. It is our 
opinion that the name Aplysia califomica nido-like virus should be re¬ 
garded as an alternate name to Aplysia abyssovirus. 

2.3. Naming and etymology 

After assigning AAbV and MLeV to nidoviruses by the above 
bioinformatics analysis, the genome sequences were submitted to the 
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Fig. 1. Nidovirus phylogeny reconstructed based on concatenated MSA of 
five replicative domains universally conserved in nidoviruses. SH-aLRT 
branch support values are depicted by shaded circles. Species names that are 
not currently recognized by ICTV are written in plain font. Asterisks designate 
viruses described in this study. 


Nidovirus Study Group (NSG) of the International Committee on the 
Taxonomy of Viruses (ICTV) for their accommodation in the nidovirus 
taxonomy; BN, senior author of this manuscript, is a member of the NSG 
and AAG assisted NSG with analysis of these viruses. Classification of 
these and other viruses were described in several taxonomic proposals 
that were made publicly available in the pending proposals section of 
ICTV on June 23rd, 2017, revised on November 26th, 2017 
(Gorbalenya et al., 2017b, 2017a; Ziebuhr et al., 2017) and August 12, 
2018. They were approved by the ICTV Executive Committee in July 
2018 and will be placed for ratification by ICTV in 2018. Throughout 
this report, we will follow the taxa naming and taxonomy from the 
pending ICTV taxonomic proposals cited above, which we interpret to 
establish priority in discovering and naming these viruses and estab¬ 
lishing the respective taxa. 

The etymology of the name abyssovirus is from the word abyss, a 
reference to the aquatic environment where Aplysia lives, to the 
Sumerian god of watery depths Abzu, and to its discovery in an RNA 
transcriptome obtained by “deep” sequencing technology. Based on 
relatively low amino acid identity to the other families in the 
Nidovirales, it is our opinion that AAbV prototypes a new nidovirus 
family, which was confirmed in the analysis described in the pending 
proposal. The NSG has also accepted our proposal to name the new 
family Abyssoviridae, the new genus Alphaabyssovirus and the new 
species Aplysia abyssovirus 1. 

The etymology of the name letovirus is in reference to the source of 
the virus in frogs, and their connection to the mythological Leto, 
daughter of the titans Coeus and Phoebe. In the story, Leto turned some 
inhospitable peasants into frogs after they stirred up the mud at the 
bottom of a pool so that she could not drink from it. Based on the low 
sequence identity but high conservation of domains found in the 
Coronavirinae, it is our opinion that MLeV is a member of a sister group 
to all known coronaviruses, but still within the Coronavirinae. Based on 
our input, the NGS named the new genus Alphaletovirus in the pending 
proposal. 
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Fig. 2. Sequence coverage of AAbV in public NCBI libraries. (A) Examples of the host organism Aplysia califomica at swimming veliger, settled, metamorphic, 
juvenile and adult developmental stages (images not to scale, adapted from Heyland et al. (2011) and Moroz et al. (2006)). Summary of distinct sequence assemblies 
and reads in the TSA (B) and EST (C) matching AAbV for which the nucleotide BLAST E value was 2x1(T 70 or smaller. (D) Map of AAbV, showing the location of the 
replicase polyprotein genes (ORFla, ORFlb), structural polyprotein gene (ORF2) and poly-adenosine tail (A n ). The position of sequences from the TSA (E) and EST 
(F) databases matching AAbV is shown. 


2.4. AAbV genome and subgenome sequences and their potential expression 

The host of AAbV is shown in ’ig. 2A. The virus was recovered from 
a variety of adult tissues, and from several developmental stages of the 
host organism, as described elsewhere (Debat, 2018). Fragments of 
AAbV were detected in 9 TSA and 9 EST databases, compiled over 
several years by three labs working in Florida and the UK ( "ig. 2B-C). 

The AAbV genome is represented in its longest and most complete 
available form by the transcriptome shotgun assembly sequence 
GBBW01007738 which represents a reverse-complementary genomic 
sequence. Remarkably, the organization of the AAbV genome has sev¬ 
eral features typical for viruses of the Alphavirus genus of the 
Togaviridae family (King et al., 2012) that could be contrasted with 
those conserved in the nidoviruses. They include: a) two in-frame open 
reading frames (ORFs; ORFla and ORFlb) of the replicase gene that are 
separated by a stop codon rather than overlapping and including a ni- 
dovirus-like ribosomal frameshift signal in the overlap, and b) a single 
structural polyprotein gene (ORF2) rather than several ORFs encoding 
structural proteins. The 35913 nt long AAbV genome has a 74 nt 5'- 
untranslated region, a 964 nt 3'-untranslated region, and a short poly-A 
tail ( 7 ig. 2D). Despite these alphavirus-like features, BLASTx analysis 
confirmed that the AAbV replicase polyprotein clusters with the Nido- 
virales, as depicted in Fig. 1. Each part of the genome is represented in 
3-20 independent sequences from the TSA and EST databases available 
at www.ncbi.nlm.nih.gov as of November 26th, 2017 (Fig. 2E-F). The 
AAbV genome ( ’ig. 3A) is the second-largest currently reported RNA 
virus genome, behind a new 41.1 kb planarian nidovirus described in a 
BioRxiv manuscript (Saberi et al., 2018). 

The sequence of the genomic 5'-terminus is supported by the five 
assemblies (GBBW01007738, GAZL01021275, GBDA01037198, 
GBCZ01030948, and GBCZ01030949) that end within one nucleotide 
of each other. The EST sequence EB188990 contains the same sequence 
with an additional S'-GGCTCGAG-S' that may represent part of the 5'- 
terminal region missing from GBBW01007738. However, we prefer to 
side with the preponderance of sequence data and consider 
GBBW01007738 the most complete AAbV genome available until fur¬ 
ther biological evidence emerges. 

The sequence of the 3'-terminus is supported by 6 TSA sequence 
assemblies and 1 EST sequence that all end within one nucleotide of 
each other. Every part of the genome is represented in at least three TSA 
sequence assemblies. Genome coverage is more abundant at the 3'-end, 
which could be evidence of 3'-coterminal subgenomic RNA species, or 
could be a result of the method used to prepare cDNA. 

Genetic variation among these sequences is as follows. There are 
four short EST sequences which appear to join different discontinuous 
regions of the genome together, but the joins occur at different posi¬ 
tions in the middle of genes and cannot be explained by nidovirus-like 
discontinuous transcription. These oddly joined sequence fragments 


likely represent either defective RNA species (Airuya et al., 1993), or 
artifacts of the EST preparation process. Two sequence assemblies dif¬ 
fered from the others, with A replacing G at nucleotide 1627, and in 
another assembly A replacing the consensus G at position 28005, both 
of which could be attributed to natural mutations or the actions of host 
cytidine deaminase on the viral minus strand. There is also some var¬ 
iation in the preserved poly-A tail sequences, presumably from the 
difficulty of accurately reading long stretches of a single nucleotide. 

In order to test whether there was support for AAbV subgenomic 
RNA species in the raw sequence data, individual sequence reads were 
mapped to the AAbV genome using Bowtie 2.3.4.1 (Langmead and 
Salzberg, 2012) and SAMtools 1.9 (Li et al., 2009). There was no a 
noticeable change in read depth at the junction between ORFla and 
ORFlb, but there was a sudden increase of about seven-fold in read 
depth immediately before the start of ORF2 (Fig. 3B), suggesting that 
ORF2 may be expressed from a subgenomic mRNA produced in relative 
abundance compared to the genomic RNA, as would be expected for a 
member of the Nidovirales. Numerous low-frequency AAbV sequence 
variants were identified in the raw sequence data, but none were con¬ 
sistent across all datasets, and no indels were consistently present 
within 1000 nucleotides of the start of ORF2. This was interpreted to 
indicate that either the viral subgenomic mRNA did not contain the 
expected nidovirus-like leader-body structure, or that any potential 5'- 
terminal leader sequences were not captured in the raw data. 

Nidoviruses express their structural and accessory proteins via a set 
of 3'-coterminal nested subgenomic RNAs, which are produced by 
discontinuous transcription on the genomic template. In this process, 
the polymerase is thought to pause at transcription-regulatory se¬ 
quences located upstream of each gene, occasionally resulting in a 
template switch to homologous transcription-regulatory sequence in 
the viral 5'-untranslated region to produce negative-stranded RNAs of 
subgenomic size (Sola et al., 2015). The longest sequence match be¬ 
tween the 5'-untranslated region and intergenic sequence of AAbV is 
shown in Fig. 3C. It consists of six of eight identical nucleotides, which 
could form eight base pairs with a reverse-complementary viral minus 
strand due to the possibility of both A-U and G-U wobble base pairing. 
However, none of the available TSA or EST sequences showed direct 
evidence of a subgenomic RNA species, such as a consistently-spliced 
transcript, or a large number of sequence reads that stop at the putative 
transcription-regulatory sequence. This sequence AAACGATG or AAA 
CGGTA needs to be investigated further to determine whether it func¬ 
tions as a transcription-regulatory sequence for viral subgenomic RNA 
production. 

Together these data suggest that the AAbV genome is reasonably 
complete, robust, and represents a novel and exceptionally large nido- 
like virus. It has the unusual genome organization which is nonetheless 
consistent with the canonical nidovirus features of large replicase 
polyproteins la and lab, ppla and pplab, respectively. They are 


162 




























K. Bukhari et al. 


Virology 524 (2018) 160-171 






Aplysia abyssovirus 1 

Protein 

Region 

Input 

Tool 

Probability 

E value 

Model 

Note 

Main protease 

ppla 

4401-4618 

Abyssovirus 1 only 

HHPred 

96% 

0.044 

3D23 

Main protease, Human coronavirus HKU1 

Conserved ppla 

ppla 

5386-5629 

Abyssovirus 1 only 

Protein BLAST 


1.0 x 10- 7 

NC_032496 

Unknown function, Turrinivirus 1 

NiRAN 

pplab 

5953-6087 

Abyssovirus 1 only 

HHPred align 

96% 

9.4 x 10-8 

NP_828869 

nsp12 NiRAN & RdRP, SARS-CoV 

RNA dependent RNA polymerase 

pplab 

6497-6893 

Abyssovirus 1 only 

HHPred 

99% 

3.9 x 10-13 

2CKW 

RNA dependent RNA polymerase, Sapporo virus 

Metal binding helicase/NTPase 

pplab 

6963-7596 

Abyssovirus 1 only 

HHPred 

100% 

2.6 x 10-65 

5WWP 

nsp13, Middle east respiratory syndrome-related-CoV 

Exonuclease/N7 Methyltransferase 

pplab 

7628-7940 

Alignment with Turrinivirus 1 

HHPred 

100% 

1.4 x 10-99 

5C8T 

nsp14 Exonuclease & N7 Methyltransferase, SARS-CoV 

Exonuclease region 

pplab 

7655-7913 

Abyssovirus 1 only 

HHPred 

87% 

0.39 

5C8T 

nsp14 Exonuclease & N7 Methyltransferase, SARS-CoV 

N7 Methyltransferase region 

pplab 

7891-8056 

Abyssovirus 1 only 

HHPred 

32% 

41 

2P8J 

SAM-dependent methyltransferase, Clostridium acetobutylicum 

20 Methyltransferase 

pplab 

8248-8353 

Abyssovirus 1 only 

HHPred align 

67% 

3.2 x 10-5 

YP_001661452 

nsp16 2'-0-methyltransferase, Gill-associated virus 

Structural Protease 

ppS 

276-471 

Abyssovirus 1 only 

HHPred 

99% 

1.0 x 10-13 

2R0L 

HGFA family SI serine protease, Human 


Fig. 3. Coding capacity, depth of coverage and bioinformatics of AAbV. (A) Genome and coding capacity of AAbV and SARS-CoV are shown to scale. (B) Total 
depth of coverage based on a sample of 672017 aligned spots matching AAbV from Aplysia californica RNA sequence read archives including SRR385787, 
SRR385788, SRR385792, SRR385793, SRR385795, SRR385800, SRR385802 and SRR385815. The putative start site of a viral subgenomic RNA species is marked 
with an arrow. (C) Alignment of the 5'-untranslated region and the intergenic sequence between the pplb and pp2 genes showing a potential transcription-regulatory 
sequence (boxed). (D) Bioinformatic assignment of domains in AAbV. Sequence(s) used for prediction (Input) were either AAbV alone or a multiple sequence 
alignment containing AAbV and TurrNV. Probability score from HHPred and E value from HHPred or BLAST are shown. Accession numbers are given for sequences 
or protein structures identified as a match for an AAbV domain (Model). 


expressed via a translational readthrough rather than frameshift me¬ 
chanism, while potential structural protein genes are presumably ex¬ 
pressed from a single subgenomic RNA to produce structural poly¬ 
protein pp2. 

2.5. AAbV protein bioinformatics 

To annotate the functional protein domains encoded in the AAbV 
genome, a series of bioinformatics tools were used. Wherever possible, 
we have followed the convention of SARS-associated coronavirus (SARS- 
CoV) species in naming domains and polyprotein processing products 
(Ref?). When run against the PDB database, HHPred (Soding et al., 
2005) predicts function based on structure. For domains like the 
polymerase where a nidovirus structure is not yet available, HHPred 
can sometimes detect a match to a homologous protein, such as the 
picornavirus polymerase. 

HHPred produced confident predictions for a coronavirus-like M pro 
(Anand et al., 2002) in ppla ( 7 ig. 3D). In pplb HHPred identified a 
picornavirus-like RNA-dependent RNA polymerase (RdRp (te Velthuis 
et al., 2009)), nspl3 metal-binding helicase (Deng et al., 2014; Ivanov 

et al., 2004), nidovirus-specific nspl4 exonuclease (ExoN (Ma et al., 
2015)) and nspl4 N7 methyltransferase (N7 MTase (Chen et al., 2009; 
Ma et al., 2011 )). In pp2, HHPred identified a chymotrypsin-like serine 
proteinase (Birktoft and Blow, 1972), a feature analogous to the al- 
phavirus capsid proteinase (Melancont and Garoff, 1987), but until now 
predicted in only one nidovirus, TurrNV. We have termed this the 
structural proteinase (S pro ). 

Where HHPred was unable to annotate a region, a protein BLAST 
search was carried out to identify likely homologs among other known 
nidoviruses. When a match was found, both proteins were aligned using 
Clustal Omega (Sievers et al., 2011), and the multiple sequence align¬ 
ment was used in HHPred. The most consistent matches to AAbV were 
from TurrNV. This identified a larger region and a more confident 
match to the coronavirus nspl4 ExoN-N7 MTase. 


Protein BLAST was used to map the AAbV nidovirus RdRp-asso- 
ciated nucleotidyl transferase (NiRAN) and nspl6 20-MTase domains 
to homologous domains from other nidoviruses. The corresponding 
regions of AAbV and the top protein BLAST match were then submitted 
to HHPred in align mode, which uses predicted structure and primary 
sequence data to compare proteins. This led to confident identifications 
of the NiRAN and a match for the divergent but functional 20 MTase 
domain of Gill-associated virus ( ieng et al, 2016). One other un¬ 
characterized domain was also identified in both AAbV and TurrNV by 
protein BLAST, in the position where the coronavirus conserved re¬ 
plication accessory proteins nsp7-10 were expected ( Tg. 3D). How¬ 
ever, there was not enough similarity between the AAbV-TurrNV con¬ 
served domain and other nidovirus domains to confidently assign a 
function to this region. 

We also looked for transmembrane regions which are typically 
clustered in three regions in nidovirus ppla. Domain-level maps of new 
and known nidoviruses ppla and pplb are shown in Figs. 4 and 5A, 
respectively. Nidoviruses typically have a cluster of an even number of 
transmembrane helices near the midpoint of ppla, equivalent to nsp3 of 
SARS coronavirus. Nidoviruses also have two other clusters of 2-8 
transmembrane helices flanking the M pro domain from both sides. 

AAbV is also missing some common but not universally conserved 
nidovirus domains. AAbV does not appear to encode a homolog of the 
uridylate-specific nidovirus endonuclease (NendoU), nor is there en¬ 
ough un-annotated protein sequence in pplb to accommodate an 
NendoU. This result is in line with the lack of this domain in other 
invertebrate nidoviruses (Nga et al., 2011). We were also not able to 
corroborate the prediction (Debat, 2018) of a papain-like proteinase 
domain situated among the predicted transmembrane regions of the 
first transmembrane cluster, or of a potential S-like domain of the 
structural polyprotein. 

The pp2 gene of AAbV encodes a putative structural polyprotein of 
3224 amino acids. HHPred and BLAST were not able to detect matches 
for any domains except S pro in AAbV pp2. TMHMM (Krogh et al., 2001) 
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Fig. 4. Comparison of predicted domain-level organization in polyprotein la of new viruses to previously described nidoviruses. Gaps have been introduced 
so to align predicted homologous domains. Virus naming and taxonomy conventions follow the ICTV proposals in which MLeV and AAbV were first described 
(Gorbalenya et al., 2017b, 2017a; Ziebuhr et al., 2017). New viruses are marked with stars, accepted taxonomic ranks are italicized and proposed taxonomic ranks 
are not italicized. Polyprotein processing products from SARS-CoV are shown at top. Domains are colored to indicate predicted similarity to SARS-CoV nspl (CoV 
nspl), SARS-CoV nsp2 (nsp2-like), ubiquitin (Ub-like), macrodomains, papain-like proteinase (PL pro ), first section of the coronavirus Y domain (CoV Yl), first section 
of the arterivirus Y domain (ArV Yl) corona virus-specific Y domain-like (CoV Y-like), carboxyl-terminal domain of coronavirus nsp4 (nsp4 CTD-like), region with 
PSIPRED predicted structural similarity to nsp4 CTD, main proteinase (M pro ), SARS-CoV nsp8-like (CoV nsp8), Equine arteritis virus nsp7a (ArV nsp7a), SARS-CoV 
nsplO (CoV nsplO), protein kinase-like (Kinase), RNA methyltransferase (Mtase), potential metal ion-binding clusters with 4 cysteine or histidine residues in a 20 
amino acid window (CH-cluster), transmembrane helices, hydrophobic transmembrane-like regions that may not span the membrane by analogy to coronavirus nsp4 
and nsp6 (TM-like) and disordered regions (Unstructured). 


predicted 13 transmembrane helices in pp2, which were generally ar¬ 
ranged in pairs with large intervening domains, which we have tenta¬ 
tively named S pro , predicted surface glycoproteins GP1-3 and a possible 
nucleoprotein ( ’ig. 5B). Included in pp2 are additional smaller domains 
that have not been named yet, pending a better understanding of pp2 
proteolytic processing. SignalP (Petersen et al., 2011) predicted an in¬ 
itial signal peptide at the extreme amino terminus, but after removing 
the predicted signal peptide and re-running the prediction with the “N- 
terminal truncation of input sequence” parameter set to zero, a total of 
six potential signal peptidase cleavage sites were detected. The identi¬ 
fication of the nucleoprotein-like domain is based on a resemblance to 
the N proteins of Bovine torovirus and Alphamesonivirus 1, and to the 
carboxyl-terminal half of the SARS-CoV N. The features the AAbV N- 
like protein shares with N of other established nidoviruses are an initial 
glycine-rich region that may be flexibly disordered, followed by a lysine 
and arginine-rich region from amino acid 2869-2913 that could facil¬ 
itate RNA binding, followed by a domain predicted by PSIPRED 
(Buchan et al., 2013) to contain a secondary structure profile similar to 
that of the Equine arteritis virus N and the SARS-CoV N carboxyl- 
terminal domain. We did not find strong evidence to support the ana¬ 
lysis of Debat (Debat, 2018) predicting a spike-like fold in GP3, but we 
concur with Debat in noticing that GP2 (and we would add, GP3) have a 
protein secondary structure profile that resembles an alphavirus El 
protein and the El-like protein of TurrNV. 

One previous report (Prince, 2003) had noted virus-like particles 
described as resembling intracellular alphavirus virions, that were 
widespread in transmission electron micrographs of Aplysia californica 
tissue, which would seem to be consistent with the alphavirus-like or¬ 
ganization of the structural poly protein and apparent El homology. 
However, further testing is necessary to confirm whether those virus¬ 
like particles are related to AAbV. 


2.6. AAbV proteinoses 

When identifying viruses through bioinformatics, there is a risk that 
the sequences are either mis-assembled, contain errors, or are artifacts 
of the sequencing and sequence assembly processes. We tested the 
function of some AAbV protein features to determine if any was bio¬ 
logically functional, as a way to better assess whether the AAbV 
genome represented a replicating virus encoding functional parts. 

The AAbV M pro and S pro plus surrounding regions up to the nearest 
preceding and following predicted transmembrane helix were cloned 
into pTriEx 1.1 and expressed with an amino-terminal herpes simplex 
virus epitope (HSV) tag, and a carboxyl-terminal poly-histidine (HIS) 
tag. Expressions were carried out by in vitro coupled T7 transcription 
and rabbit reticulocyte lysate translation. M pro cleavage at an amino- 
terminal site was detected by the presence of an approximately 16 kDa 
HSV-tagged fragment (Fig. 6), which would be expected if M pro clea¬ 
vage occurred in the vicinity of amino acid 4375, located near the start 
of the region of M pro homology at amino acid 4401 ( 7 ig. 3D). S pro was 
expressed, but did not produce any detectable cleavage products in the 
same assay (data not shown). From this we concluded that AAbV M pro 
appeared to have proteinase activity in the context of our expression 
construct, while our S pro construct did not. Further work will be needed 
to determine whether the failure of the putative S pro to cleave was a 
result of the construct boundaries, assay conditions, lack of an appro¬ 
priate substrate, or errors in the protein sequence. 

To further characterize the activity of AAbV M pro , alanine-scanning 
mutations were made to amino acids that appeared to match the cat¬ 
alytic cysteine and histidine residues of other coronavirus main pro- 
teinases. Mutation of the putative catalytic histidine H4429 did not 
strongly reduce proteolytic processing, while mutation of the cysteine 
C4538 blocked proteinase activity (Fig. 6). These data demonstrate that 
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Fig. 5. Comparison of predicted domain-level organization in polyprotein lb of new viruses to previously described nidoviruses. (A) Domains include the 
nidovirus RdRp-associated nucleotidyl transferase (NiRAN), RdRp, potential metal ion binding clusters with four cysteine or histidine residues in a window of 20 
amino acids (CH cluster), homologs of the domain of unknown function in the middle of coronavirus nspl3 (CoV nspl3b), superfamily 1 helicase (SF1 Helicase), 
nidovirus-specific exonuclease (ExoN) and uridylate-specific endonuclease (NEndoU), RNA cap N7 methyltransferase (N7 MTase) and RNA cap 2 / -0-methyl- 
transferase (20 MTase). (B) Domains of pp2 include the structural protease (S pro ), putative glycoproteins GP1, GP2 and GP3, and a nucleoprotein-like domain (N?), 
TMHMM-predicted transmembrane domains and SignalP-predicted signal peptidase cleavage sites. 


AAbV encodes at least one functional proteinase, but further work is 
needed to determine the cleavage specificity and map proteolytic pro¬ 
cessing by the AAbV M pro . 

2.7. AAbVpplab expression 

Another unusual feature of AAbV was the presence of an in-frame 
stop codon separating the ppla and pplb genes, rather than the ex¬ 
pected ribosomal frameshift signal found in most other nidoviruses. We 
note that an in-frame stop codon separates the putative ppla and pplb 
of the molluscan nidovirus Tunninivirus 1, which was phylogenetically 
grouped with AAbV and Alphamesonivirus 1 ( ug. 1). This suggested that 
AAbV may use a translational termination-suppression signal as a way 
to control expression of the pplb region. Termination-suppression sig¬ 
nals are found in several other viruses including alphaviruses and some 
retroviruses, and typically consist of a UAG or UGA stop codon followed 
by an RNA secondary structure element, and the efficiency of sup¬ 
pression normally depends on the stop codon, the nucleotides im¬ 
mediately following the stop codon, and the free energy of the RNA 
secondary structure element (Feng et al., 1992). The ppla gene of AAbV 
ends in a UGA stop codon, and the region that follows was predicted by 
Mfold ( Zuker, 2003) to be capable of forming several related RNA 
secondary structure elements, of which the most consistently predicted 
is shown in ug. A. A potential pseudoknot-like conformation in the 
same region is shown by Debat (Debat, 2018). 

To investigate protein expression at the ppla-pplb region, nucleo¬ 
tides 17255-17707 were cloned into pTriex 1.1 with amino-terminal 
HSV and carboxyl-terminal HIS tags. This construct would allow de¬ 
tection and quantification of the 25kDa proteins that stopped at the 
natural UGA stop codon that would have an HSV tag only, and 35 kDa 


readthrough products that would have both HSV and HIS tags. 
Expression of this construct produced the expected 25 kDa termination 
product and 35 kDa readthrough product ( ’ig. B-D). Based on densi¬ 
tometry analysis (not shown), it was estimated that 25-30% of trans¬ 
lation events resulted in readthrough. 

The choice of stop codon and elements of the two codons that follow 
have been shown to affect the efficiency of translational termination 
(Cridge et al., 2018; Skuzeski et al., 1991). To further investigate the 
AAbV termination-suppression signal, constructs were made in which 
the region around the ppla stop codon was perturbed from the wild- 
type UGAC, predicted to produce near optimal termination, to UAAA, 
predicted to produce much less than optimal termination. In another 
construct, 42 nucleotides predicted to form one side of the predicted 
RNA stem-loops were deleted (A42; Fig. A). Mutation of the AAbV 
ppla stop codon had little effect on readthrough efficiency (Fig. B), 
but deletion of 42 nucleotides predicted to be involved in RNA sec¬ 
ondary structures appeared to decrease readthrough, and led to a 
smaller readthrough product as predicted. Together these results in¬ 
dicate that the pplb region of AAbV is probably expressed by read- 
through of a UGA stop codon, mediated by a functional termination- 
suppression signal that is dependent on sequences following the stop 
codon. 

2.8. MLeV genome 

Microhyla letovirus is represented by a single assembly (accession 
number GECV01031551) of 22304 nucleotides that potentially encodes 
a partial corona-like virus from near the end of a protein equivalent to 
SARS-CoV nsp3 to the 3'-end ( ’ig. 8A). No other matches for this se¬ 
quence were found in the TSA or EST databases by nucleotide BLAST. 
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Fig. 6. Investigation of proteinase activity of AAbV M pro . The AAbV main 
proteinase (M pro ; A-B) and surrounding regions were expressed as HSV and HIS- 
tagged constructs as shown in panel A. A white triangle marks the expected size 
of the 52.5 kDa uncleaved M pro constructs. Black triangles mark the size of 
approximately 16 kDa amino-terminal cleavage products. Non-specific bands 
that were also present in control lanes are indicated with a star. 

The host organism of MLeV is shown in Fig. 8B. Mapping single se¬ 
quence reads onto the genome revealed a strong age dependence of 
MLeV detection. The number of fragments per kilobase of transcript per 
million mapped reads decreased by seven-fold from pre-metamorphosis 
to metamorphic climax, then decreased again by fourteen-fold from 
metamorphic climax to completion of metamorphosis. Further testing 
was done by reverse transcriptase polymerase chain reaction using 
MLeV-specific primers on the same population of adult frogs later in the 
year, but all the adult material tested was negative for MLeV (LZ, 
personal communication). 

The MLeV genome is missing the 5'-end of the genome, including a 
5'-untranslated region and sequences corresponding to coronavirus 


nspl, nsp2 and part of nsp3. The size of the missing part of the genome 
can be estimated at 1500-4000 nucleotides based on comparison to 
complete genomes from the relatively small deltacoronaviruses or the 
relatively large alphacoronaviruses. The MLeV genome contains a 572 
nucleotide 3'-untranslated region and an 18-nucleotide poly-adenosine 
tail. 

The genome organization of MLeV was similar to that of cor- 
onaviruses, with a predicted -1 ribosomal frameshift signal. Usually, a 
programmed -1 ribosomal frameshift signal consists of three elements: a 
slippery sequence that is most commonly UUUAAAC in coronaviruses, a 
stop codon for the upstream coding region, and a strong RNA secondary 
structure or pseudoknot. MLeV encodes a potential slippery sequence at 
nucleotide 6085 (UUUAAAC) followed immediately by a UAA stop 
codon for ppla. The region following the putative frameshift signal was 
predicted by Mfold to adopt a stem-loop conformation which may be 
part of an RNA pseudoknot (not shown), but further biological char¬ 
acterization is needed to determine the boundaries of the frameshifting 
region and test its frameshifting efficiency. 

The 3'-end of the MLeV genome contains six ORFs that could encode 
proteins of 50 or more amino acids, which presumably include the viral 
structural proteins. Five of the six 3'-end ORFs are preceded by a se¬ 
quence UCUAAHA (where H is any nucleotide except G), that resembles 
the UCUAAAC transcription regulatory sequence of the coronavirus 
mouse hepatitis virus. These candidate transcription-regulatory se¬ 
quences start 6-66 nucleotides before the AUG start codon of the next 
ORF. Without the 5'-end or any evidence of viral subgenomic RNAs, it is 
not possible to be certain how the 3'-end ORFs are expressed, but these 
repeated sequences are evidence that MLeV may express its structural 
proteins from subgenomic RNAs in the manner of coronaviruses. 
Unfortunately, the original RNA sample that was used for Microhyla 
fissipes transcriptomic analysis was completely consumed, and could 
not be further tested by RT-PCR. 

The first of these downstream ORFs encodes a large S-like protein of 
1526 amino acids with an amino-terminal signal peptide predicted by 
SignalP and a carboxyl-terminal transmembrane region predicted by 
TMHMM. The second and third ORFs appear to encode a unique single¬ 
pass transmembrane protein of 55 amino acids (ORF 2b) and a unique 
soluble 157 (ORF 3) amino acid protein, respectively, which are likely 
strain-specific accessory proteins. The fourth ORF encodes an E-like 
protein of 77 amino acids, with an amino-terminal predicted trans¬ 
membrane region followed by a potential amphipathic helix predicted 
by Amphipaseek (Sapay et al., 2006). The fifth ORF encodes a 241 
amino acid long three-pass transmembrane protein that resembles the 
coronavirus M protein, and the sixth ORF encodes a putative N protein 
of 459 amino acids. Together, these 3'-ORFs appear to encode a 
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Fig. 7. Mutational analysis of the termination-suppression signal (TSS) at the ORFla/b junction. (A) Schematic view of the TSS expression construct and 
introduced HSV and HIS tags, showing only predicted RNA secondary structures that were consistent in the best six models generated by Mfold. Mutations around the 
stop codon (bold, producing the UAAA construct) or removing one side of the predicted stem-loops (A42) are shown. (B-D) Western blots showing translation of 
mutant TSS expression constructs in a coupled T7 polymerase rabbit reticulocyte lysate expression system. Blots were probed with anti-HSV (B, D) to detect both 
25 kDa terminated and 32-35 kDa readthrough products, or with anti-HIS (C) to detect only readthrough products. 
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complete coronavirus functional repertoire, and are present in the same 
order found on all other currently known coronavirus genomes 
(Neuman and Buchmeier, 2016). The start codons of the putative S and 
M ORFs appear to overlap with the stop codons of preceding ORFs, 
indicating a relatively compact genome. 

To test whether there was support for MLeV subgenomic RNA species 
in the raw sequence data, individual sequence reads were mapped to the 
MLeV genome using the same method used for AAbV above (Fig. 9A). 


There was not a noticeable change in read depth at the junction between 
ORFs la and lb of MLeV, suggesting that polyprotein lb is expressed by a 
translational rather than transcriptional mechanism. However, there were 
two sudden increases of about eight-fold in read depth immediately before 
the start of the N ORF and near the beginning of the adjacent E and M 
ORFs (Fig. 9B). Expected increases in read depth before the putative S gene 
and the largest putative accessory gene were not detected. As with AAbV, 
many low-frequency sequence variants were detected in the raw sequence 
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Fig. 10. Speculative annotation of nidovirus structural proteins. Where structures or functions were not known, proteins were categorized according to general 
PSIPRED secondary structure profile. Marked domains include coronavirus spike protein homologs (Spike) and structurally similar regions (|3-a), alphavirus El 
homologs (El) and structurally similar regions (|3a{3), coronavirus envelope-like proteins (E-like), coronavirus membrane proteins (M-like) and structurally similar 
proteins ({3), potential nucleoprotein (N-like), chymotrypsin-like structural proteinase (S pro ), similar to the bovine viral diarrhea virus structural RNAse (BVDV 
RNAse), proteins related to influenza A virus hemagglutinin (HA) or torovirus hemagglutinin-esterase (HE), other viral surface glycoproteins (GP-like), domains of no 
known function (Unknown), SignalP-predicted signal peptidase cleavage sites (SP cleavage), and potential sites cleaved by unknown proteinases by analogy to other 
nidovirus structural proteins. 


data, but no indels were consistently present in the region surrounding the 
putative transcription-regulatory sequences. These data suggest that at 
least the M and N genes of MLeV are expressed via subgenomic mRNAs. 

2.9. MLeV protein bioinformatics 

In the ppla region, HHPred detected matches for conserved cor¬ 
onavirus domains including the carboxyl-terminal domain of cor¬ 
onavirus nsp4, M pro , nsp7, nsp8, nsp9 and nsplO (Fig. 8C). In the pplb 
region, HHPred detected matches for a picornavirus-like RdRp, the 
nspl3 metal-binding helicase, the nspl4 ExoN-N7 MTase, the nspl5 
NEndoU, and the nspl6 20 MTase. In the structural protein region, 
HHPred detected a match for the amino-terminal domain of cor¬ 
onavirus N in the putative MLeV N protein. 

As with AAbV, we then widened our search to include conserved 
coronavirus domains that do not yet have known protein structures. 
This led to a match for the carboxyl-terminal region of nsp3, amino- 
terminal region of nsp4, nsp6, the nspl2 NiRAN domain, and a match 
between coronavirus M and the proposed MLeV M protein. Neither the 
proposed MLeV S nor E protein could be further corroborated by 
bioinformatics tools. Together, this indicated that MLeV appears to 
encode a complete set of conserved coronavirus-like proteins from the 
carboxyl-terminal region of nsp3 through the end of the genome. 

3. Discussion and conclusions 

With the addition of MLeV, AAbV and a host of other recently- 
published highly divergent nidoviruses, the field of nidovirus evolution 
is due for a revision, which will require a detailed approach and that 
will fit best in another study. However, a few tentative conclusions can 
be drawn from these new viruses. 

Firstly, the new viruses confirm that the region of ppla up to the 
SARS-CoV nsp4 equivalent, which seems to contain a variety of anti¬ 
host countermeasures in the viruses where this region has been studied 
(Neuman et al., 2014), is highly variable and does not appear to contain 
any universally-conserved domains. As previously noted (Lauber et al., 
2013), this part of the genome appears to have the most genetic flex¬ 
ibility, even within viral genera, and likely has great relevance to those 
studying interactions between viruses and innate immunity (Bailey- 
Elkin et al., 2014; Lokugamage et al., 2015; Mielech et al., 2014). It is 
worth noting that the region preceding the M pro in AAbV is over 13 kb - 
larger than most other complete RNA virus genomes. 

Secondly, two elements of genome architecture seem to be 


conserved throughout the Nidovirales: a M pro flanked by multi-pass 
transmembrane regions, and the block containing NiRAN-RNA poly¬ 
merase-metal binding-Helicase. Knowledge of these apparent nidovirus 
genetic synapomorphies should make it possible to design searches to 
detect even more divergent nido-like viruses in transcriptomes. 

Thirdly, the NendoU domain appears to be found only in viruses 
infecting vertebrate animals, and is lacking in every known nidovirus- 
like genome from an invertebrate host. This suggests that the function 
of NendoU may have evolved as a countermeasure to conserved me¬ 
tazoan viral RNA recognition machinery involved in innate immunity 
(Lokugamage et al., 2015). 

Fourthly, while most currently known nidovirus species are asso¬ 
ciated with terrestrial hosts, the greatest phylogenetic diversity of ni¬ 
doviruses is now associated with hosts that live in aquatic environ¬ 
ments. Since terrestrial metazoan transcriptomes are relatively well- 
sampled in comparison to aquatic and particularly marine metazoa, we 
would predict this trend is likely to continue. Of the eight proposed 
nidovirus families shown in Figs. 4 and 5, four contain only viruses 
associated with aquatic hosts, two ( Arteriviridae (Shi et al., 2018) and 
the proposed Tobaniviridae) are found in a mix of strictly aquatic and 
strictly terrestrial animals, and two ( Coronaviridae , Mesoniviridae ) are in 
part associated with hosts such as mosquitoes and frogs that have an 
obligate aquatic larval phase. Taken together, this data suggests that it 
may be useful to consider potential routes of interspecies transmission 
between marine, freshwater and terrestrial hosts in future studies of 
nidovirus evolution, as more data becomes available. 

Lastly, the structural protein repertoire of nidoviruses appears to be 
quite broad compared to other known virus orders. There do not appear 
to be any conserved nidovirus structural proteins with the possible 
exception of the nucleoprotein (discussed elsewhere (Neuman and 
Buchmeier, 2016)), and even that homology can only be regarded as 
hypothetical until more structures of putative nucleoproteins are 
solved. A tentative categorization of nidovirus structural proteins, 
based on size, predicted transmembrane regions, and predicted protein 
secondary structure is shown in Fig. 10. If correct, this would indicate 
that nidoviruses have a diverse set of structural proteins that includes a 
variety of possibly unrelated spike-like proteins plus components 
shared with Orthomyxoviridae (HA and HE), Togaviridae (El and the E3 
structural serine proteinase), Flaviviridae (the capsid RNAse). This 
structural repertoire appears to be variously expressed from sub¬ 
genomic RNAs encoding a single gene (as proposed for MLeV), giant 
polyproteins such as that of AAbV, and a mix of intermediate-sized 
polyproteins and single genes, as in the Roniviridae. Taken together, 
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these observations suggest that structural proteins are widely shared 
and exchanged among RNA viruses, and that conserved elements of the 
replicase will be more useful than structural proteins for anyone trying 
to construct trees that connect viruses at taxonomic ranks above the 
family level. 

4. Materials and methods 

4.1. Phytogeny 

Nidovirus phylogeny was reconstructed based on MSA of con¬ 
catenated M pro , NiRAN, RdRp, CH cluster and SF1 Helicase conserved 
cores(3417-3905, 5441-5866, 6095-7291, 7340-7504, 7781-8545 nt 
of the Equine arteritis virus genome X53459.3), prepared with the help 
of Viralis platform (Gorbalenya et al., 2010). Representatives of 28 
nidovirus species (Supplementary table 1) delineated in recent ICTV 
proposals (Brinton et al., 2017; Gorbalenya et al., 2017b, 2017a; 
Ziebuhr et al., 201 r ) were used. Phylogeny was reconstructed by IQ 
Tree 1.5.5 using a partition model where the evolutionary model for 
each of the five domains was selected by ModelFinder (Chernomor 
et al., 2016; Kalyaanamoorthy et al., 2017; Nguyen et al., 2015). To 
estimate branch support, Shimodaira-Hasegawa-like approximate like¬ 
lihood ratio test (SH-aLRT) with 1000 replicates was conducted. The 
tree was midpoint rooted and visualized with the help of R packages 
APE 3.5 and phangorn 2.0.4 (Paradis et al., 2004; R Development Core 
Team, 2011; Schliep, 2011). 

4.2. Protein assays 

Nucleotides 12926-14176 containing the AAbV M pro and flanking 
regions extending to the preceding and following predicted trans¬ 
membrane regions was produced as a synthetic GeneArt Strings DNA 
fragment (Invitrogen). This was used as the template in a 50 pi PCR 
reaction using primers Aby_IF_MP_F (CCCCGAGGATCTCGAGTTGCGA 
ATGATTTTGTCTACC) and Aby_IF_MP_R (GATGGTGGTGCTCGAGACA 
CAGACAACACAACAAAAA) with lx Phusion High Fidelty PCR 
Mastermix (Thermo Fisher Scientific). The 1283 bp PCR product was 
gel extracted using a QIAquick gel extraction kit (Qiagen) and cloned 
into pTriExl.l (Novagen/Merck) linearised with Xhol using In-Fusion 
HD cloning reagents (Clontech). 2 pi of the In-Fusion reaction was 
transformed into Stellar chemically competent cells as per the manu¬ 
facturers protocol (Clontech) and selected on LB agar containing 
100 pg/mL ampicillin. The final construct with a T7 RNA polymerase 
promoter and in-frame amino-terminal HSV and carboxyl-terminal HIS 
tags was verified by Sanger sequencing (Source Bioscience) of plasmid 
DNA purified using a QIAquick spin miniprep kit (Qiagen). Site-di¬ 
rected mutagenesis was carried out using the Quikchange II (Agilent) 
reagents and protocol. Protein expression was carried out in a 50 pi 
reaction volume using 0.5 pg of plasmid DNA with the TnT® Quick 
Coupled Transcription/Translation System (Promega) reagents and 
protocol. In vitro transcription and translation was carried out for 1 h. 

Samples containing expressed proteins were mixed with an equal 
volume of 2x SDS PAGE loading buffer containing 100 mM Tris-HCL 
pH6.8, 4% w/v SDS, 20% w/v glycerol, 0.2% bromophenol blue, 2% j3- 
mercaptoethanol. Samples were boiled at 100 °C for 10 min, collected 
by gentle centrifugation, and loaded in Mini-PROTEAN precast poly¬ 
acrylamide gels (BioRad). After electrophoresis, proteins were blotted 
to PVDF membranes for 80 min at 150 mA using a Trans-Blot Turbo 
(BioRad). Membranes were blocked overnight at 4 °C with 5% (w/v) 
non-fat milk powder in TBST (50 mM Tris, 150 mM NaCl, 0.1% Tween 
20, pH 7.5). Membranes were then washed three times for 5 min each 
on a rocking platform at 25 rpm with TBST buffer before addition un¬ 
conjugated rabbit anti-HIS tag monoclonal antibody (Abeam) or un¬ 
conjugated rabbit anti-HSV tag monoclonal antibody (Abeam) for 1 h. 
Membranes were again washed three times for 5 min each with TBST 
buffer before addition of horseradish peroxidase-conjugated goat anti¬ 


rabbit secondary antibody for 1 h. For detection, ChemiFast chemilu¬ 
minescent reagent (Syngene) was used to detect bound secondary an¬ 
tibody. Samples were visualized using a Syngene Chemi XL G:Box gel 
documentation system. Gel images were cropped and brightness and 
contrast of images was adjusted using GIMP software (GIMP team). 

The region from the ppla-pplb junction containing the putative ter¬ 
mination-suppression signal of AAbV, nucleotides 17255-17707, was PCR 
amplified from a synthetic GeneArt Strings fragment (Invitrogen) using 
primers Aby_IF_SS_F (CCCCGAGGATCTCGAGGAGTCTTGTCGTGTGAAGT) 
and Aby_IF_SS_R (GATGGTGGTGCTCGAGAGGATTAATCCGTCTGTCAA). 
The predicted S pro -containing region of AAbV, nucleotides 25918-27183, 
was PCR amplified from a synthetic GeneArt Strings fragment (Invitrogen) 
using primers Aby_IF_TryP_R (GATGGTGGTGCTCGAGCGGTTTGTTCGCA 
TACAGA) and Aby_IF_TryP_R (GATGGTGGTGCTCGAGCGGTTTGTTCGCA 
TACAGA). Both the S pro and putative ppla-pplb termination-suppression 
signal products were cloned, expressed and detected in the same way as 
AAbV M pro . 

4.3. Microhyla prevalence 

Data for the MLeV prevalence study comes from a published report 
( "hao et al., 2016). Briefly, nine tadpoles were sacrificed, using three 
individuals from each of the three developmental stages as independent 
biological replicates. One microgram of mRNA of each stage sample 
was sequenced on an Illumina HiSeq. 2000 platform by NovoGene 
(Beijing), and paired-end reads were generated. 
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